Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store everest results in ERT storage #9161

Merged

Conversation

yngve-sk
Copy link
Contributor

@yngve-sk yngve-sk commented Nov 6, 2024

Issue
Resolves #8811

Base idea/documentation:

Store datasets by [batch, realization, perturbation] x [controls, objectives, constraints, objective_gradient, constraint_gradient]:

Exhaustive list of data stored PER BATCH :

  • batch.json - contains info about the batch, batch_id and whether it is an improvement (aka merit flag, but the concepts are now unified for dakota and non-dakota runs)
  • batch_constraints constraint values (and violations) for constraints, batch-wide
  • batch_objectives objective values, batch-wide
  • realization_controls - control values for geo-realizations, also includes simulation_id
  • realization_objectives - objective values per geo-realization
  • realization_constraints - constraint values per geo-realization
  • perturbation_objectives - objective and control values per perturbation
  • perturbation_constraints - constraint and control values per perturbation (Note/discussion point: control values could be pulled into separate table to avoid redundancy)
  • batch_objective_gradient - Partial derivatives of objectives, given different controls. This dataset has one column per objective, and one row per control value, and the intersecting cells represent the partial derivative of the objective wrt that control value.
  • batch_constraint_gradient - Partial derivatives of constraints, given different controls. This dataset has one column per constraint, and one row per control value, and the intersecting cells represent the partial derivative of the constraint wrt that control value.

Example data from math_func/config_advanced.yml (json format)
Screenshot 2025-01-10 at 14 53 04

Exhaustive list of data stored PER OPTIMIZATION

  • controls.json - control values for this batch
  • realization_weights.json - realization weights
  • nonlinear_constraints - conditions for constraints to satisfy (on average over the batch)
  • objective_functions - objective function names, weights, and normalization

Example data from math_func/config_advanced.yml
Screenshot 2025-01-10 at 15 00 29

Potential simplifications

The everest_data_api is currently used for plotting, but could be used (probably expanded a bit) to avoid doing direct (polars) dataframe manipulations elsewhere in the code, but currently they are done directly in the code.

@codecov-commenter
Copy link

codecov-commenter commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 98.44156% with 6 lines in your changes missing coverage. Please review.

Project coverage is 91.84%. Comparing base (957b377) to head (af4a12a).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
src/everest/everest_storage.py 98.42% 5 Missing ⚠️
src/everest/api/everest_data_api.py 97.56% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9161      +/-   ##
==========================================
+ Coverage   91.67%   91.84%   +0.16%     
==========================================
  Files         424      424              
  Lines       26511    26748     +237     
==========================================
+ Hits        24305    24566     +261     
+ Misses       2206     2182      -24     
Flag Coverage Δ
cli-tests 39.94% <0.00%> (+0.24%) ⬆️
everest-models-test 34.71% <55.32%> (+0.64%) ⬆️
gui-tests 74.32% <0.00%> (+0.07%) ⬆️
integration-test 38.45% <85.45%> (+0.61%) ⬆️
performance-tests 51.69% <0.00%> (+0.21%) ⬆️
test 39.25% <97.14%> (+0.06%) ⬆️
unit-tests 74.14% <26.23%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yngve-sk yngve-sk force-pushed the 24.10.25.store-everest-opt-results-in-ertstorage branch 14 times, most recently from 7081e58 to 32193bd Compare November 13, 2024 11:20
@yngve-sk yngve-sk changed the title (wip) Store everest results in ERT storage Store everest results in ERT storage Nov 13, 2024
@yngve-sk yngve-sk marked this pull request as ready for review November 13, 2024 11:26
@yngve-sk yngve-sk force-pushed the 24.10.25.store-everest-opt-results-in-ertstorage branch 13 times, most recently from 5b54ee7 to dda3db9 Compare November 15, 2024 13:06

mapping = {}
for d in dummy_df.select("realization", "simulation_id").to_dicts():
# Currently we work with str, but should maybe not be done in future
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to keep this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

def _store_gradient_results(self, results: FunctionResults) -> _GradientResults:
perturbation_objectives = polars.from_pandas(
results.to_dataframe("evaluations").reset_index()
).drop("plan_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no drop if possible.

# expected to be None?
batch_objective_gradient = polars.from_pandas(
results.to_dataframe("gradients").reset_index()
).drop("plan_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no drop if possible.

@yngve-sk yngve-sk force-pushed the 24.10.25.store-everest-opt-results-in-ertstorage branch 2 times, most recently from 005c29e to 54c9c5c Compare January 14, 2025 07:38
Copy link
Contributor

@verveerpj verveerpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yngve-sk yngve-sk force-pushed the 24.10.25.store-everest-opt-results-in-ertstorage branch 2 times, most recently from ac09579 to af4a12a Compare January 15, 2025 06:50
def _enforce_dtypes(df: polars.DataFrame) -> polars.DataFrame:
dtypes = {
"batch_id": polars.UInt32,
"result_id": polars.UInt32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you drop all instances of storing result_id? I do not think it is needed anywhere and I am considering changing/dropping it in ropt, so it would be better not to depend on it.

@yngve-sk yngve-sk force-pushed the 24.10.25.store-everest-opt-results-in-ertstorage branch 2 times, most recently from 35d6861 to 1bd76d6 Compare January 15, 2025 14:22
@yngve-sk yngve-sk force-pushed the 24.10.25.store-everest-opt-results-in-ertstorage branch from 1bd76d6 to 6bef204 Compare January 15, 2025 14:30
@verveerpj verveerpj merged commit 8fe4819 into equinor:main Jan 15, 2025
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Refactor communication and storage of optimization results
4 participants